{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 数据准备"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "\n",
    "Y_true = np.array([1,0,0,0,1,0,1,0,])\n",
    "Y_pred = np.array([0.9, 0.8, 0.3, 0.1,0.4,0.9,0.66,0.7])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## AUC基础"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "n_pos=np.sum(Y_true)    # 正样本的数量\n",
    "n_neg=len(Y_true)-n_pos    # 负样本的数量\n",
    "denominator=n_pos*n_neg    # 总的组合数，分母"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "使用分箱来模拟一个离散的ranking："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "n_bins=100\n",
    "bin_width=1/n_bins\n",
    "\n",
    "# 某一rank值下对应的正负样本数\n",
    "pos_ranking=[0 for _ in range(n_bins)]\n",
    "neg_ranking=[0 for _ in range(n_bins)]\n",
    "\n",
    "# 对所有rank的正负样本计数\n",
    "for idx,label in enumerate(Y_true):\n",
    "    rank=int(Y_pred[idx]/bin_width)    # 计算概率值对应的rank\n",
    "    if label==1:\n",
    "        pos_ranking[rank]+=1\n",
    "    else:\n",
    "        neg_ranking[rank]+=1"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "由于是离散的rank，那么可以由低往高开始计算满足$P_{pos}>P_{neg}$的正负样本对。对于最低rank的正样本，可以进行的组合数肯定为$0$，而对于rank为倒数第二的正样本，可以进行的组合数为```pos_ranking[-2]*neg_ranking[-1]```，而对于rank为倒数第三的正样本，合法的组合数为```pos_ranking[-3]*(neg_ranking[-1]+neg_ranking[-2])```。对于rank值相等的正负样本对，我们将其看成是半个样本对。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "acc_neg=0    # 由低往高开始计数，那么可以与正样本组合的负样本数是不断累加的\n",
    "legal_pair=0    # 满足P_pos>P_neg的正负样本对\n",
    "for rank in range(n_bins):\n",
    "    legal_pair+=(pos_ranking[rank]*acc_neg+pos_ranking[rank]*neg_ranking[rank]*0.5)\n",
    "    acc_neg+=neg_ranking[rank]\n",
    "    \n",
    "AUC=legal_pair/denominator\n",
    "\n",
    "print(AUC)"
   ]
  }
 ],
 "metadata": {
  "hide_input": false,
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.6"
  },
  "toc": {
   "base_numbering": 1,
   "nav_menu": {},
   "number_sections": true,
   "sideBar": true,
   "skip_h1_title": false,
   "title_cell": "Table of Contents",
   "title_sidebar": "Contents",
   "toc_cell": false,
   "toc_position": {},
   "toc_section_display": true,
   "toc_window_display": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
